\(w_{ij}\) is the spatial weight between locations i and j
The Global G Test: Data in July
Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
9.031613e-04 9.054901e-04 1.121059e-12
The output shows a standard deviate of -2.199,
the observed clustering of wildfires is -2.199 standard deviations away from what would be expected under the null hypothesis of no clustering.
This value is associated with a p-value of 0.9861, so observed clustering is not statistically significant at the 0.05 level.
The Global G Test: Data in August
Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
8.942119e-04 8.907281e-04 7.024831e-13
The output shows a standard deviate of 4.1566,
The observed clustering of wildfires is 4.1566 standard deviations away from what would be expected under the null hypothesis of no clustering.
This value is associated with a p-value of 1.615e-05, so observed clustering is statistically significant at the 0.05 level.
The Hotspots in August
Conclusion
Spatial statistics can help identify areas where wildfires are clustered (hotspots)
This information is crucial for policymakers to prioritize resources and interventions.
The spatial distribution of wildfires using Gi* statistics can help inform policy makers where to allocate resources more effectively
Technical Appendix
The full code to replicate this study is in the appendix of the memo.
Part 1: Reading the Data
library(sf)library(stars)library(dplyr)library(sfdep)library(tidyr)library(ggplot2)#Step1: Turning spherical geomtry offsf::sf_use_s2(FALSE)#Step2: Reading world bordersborders <-st_read(dsn="/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/week9/data/boundaries/all-country-boundaries.shp", quiet =TRUE)#Step3: Simplifying bordersborders2<-st_simplify(borders, dTolerance =0.1)#Step4: Reading rasters with wildfiresleft_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000000000.tif")right_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000023296.tif")#Step5: Mosaicking the wildfire rastersjuly<-st_mosaic(left_hem, right_hem)#Step4: Reading rasters with wildfiresleft_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000000000.tif")right_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000023296.tif")#Step5: Mosaicking the wildfire rastersaugust<-st_mosaic(left_hem, right_hem)#Step6: Reducing resolution. Higher dx/day values values mean lower resolutionburn2_july<-st_warp(july, st_as_stars(st_bbox(borders), dx =0.1, dy =0.1))#Step7: Turning the rasters into pointsburn3_july<-st_as_sf(burn2_july, as_points =FALSE, merge =FALSE)names(burn3_july)[1]<-"burn"#Step6: Reducing resolution. Higher dx/day values values mean lower resolutionburn2_august<-st_warp(august, st_as_stars(st_bbox(borders), dx =0.1, dy =0.1))#Step7: Turning the rasters into pointsburn3_august<-st_as_sf(burn2_august, as_points =FALSE, merge =FALSE)names(burn3_august)[1]<-"burn"
Part 2: Doing the neighbor analysis: Data July
library(spdep)#Step1: Create a neighbor list based on queen contiguitylist_nb <- spdep::poly2nb(burn3_july, queen =TRUE)#Step2: Checking for empty neighbor sets (polygons without neighbors).empty_nb <-which(card(list_nb) ==0)#Step3: Remove polygons with empty neighbor sets from the datates_subset <- burn3_july[-empty_nb, ]#Step4: Identify neighbors with queen contiguity (edge/vertex touching)tes_nb <-poly2nb(tes_subset, queen =TRUE)#Step5: Binary weighting assigns a weight of 1 to all neighboring features # and a weight of 0 to all other featurestes_w_binary <-nb2listw(tes_nb, style="B")#Step6: Calculate spatial lag for burntes_lag <-lag.listw(tes_w_binary, tes_subset$burn)#Step7: Test for global G statistic of burned areasglobalG.test(tes_subset$burn, tes_w_binary)
Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
9.031613e-04 9.054901e-04 1.121059e-12
Part 2: Doing the neighbor analysis: Data August
library(spdep)#Step1: Create a neighbor list based on queen contiguitylist_nb <- spdep::poly2nb(burn3_august, queen =TRUE)#Step2: Checking for empty neighbor sets (polygons without neighbors).empty_nb <-which(card(list_nb) ==0)#Step3: Remove polygons with empty neighbor sets from the datates_subset <- burn3_august[-empty_nb, ]#Step4: Identify neighbors with queen contiguity (edge/vertex touching)tes_nb <-poly2nb(tes_subset, queen =TRUE)#Step5: Binary weighting assigns a weight of 1 to all neighboring features # and a weight of 0 to all other featurestes_w_binary <-nb2listw(tes_nb, style="B")#Step6: Calculate spatial lag of TreEqtytes_lag <-lag.listw(tes_w_binary, tes_subset$burn)#Step7: Test for global G statistic of burned areasglobalG.test(tes_subset$burn, tes_w_binary)
Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
8.942119e-04 8.907281e-04 7.024831e-13
Calculating the Local GI
We can calculate the Local GI to get a sense of the local hotspots in wildfires
# Calculate the Gi using local_g_permtes_hot_spots <- tes_nbs %>%mutate(Gi =local_g_perm(burn, nb, wt, nsim =999)# nsim = number of Monte Carlo simulations (999 is default) ) |># The new 'Gi' column itself contains a dataframe # We can't work with that, so we need to 'unnest' itunnest(Gi)
Let’s do a cursory visualization of Gi values
Calculating the Local GI
We can zoom in even further
Final Hotspot in August
tes_hot_spots2<-subset(tes_hot_spots, select =c(gi, p_folded_sim))tes_hot_spots3<-tes_hot_spots2%>%mutate(# Add a new column called "classification"classification =case_when(# Classify based on the following criteria: gi >0& p_folded_sim <=0.01~"Very hot", gi >0& p_folded_sim <=0.05~"Hot", gi >0& p_folded_sim <=0.1~"Somewhat hot", gi <0& p_folded_sim <=0.01~"Very cold", gi <0& p_folded_sim <=0.05~"Cold", gi <0& p_folded_sim <=0.1~"Somewhat cold",TRUE~"Insignificant" ),# Convert 'classification' into a factor for easier plottingclassification =factor( classification,levels =c("Very hot", "Hot", "Somewhat hot","Insignificant","Somewhat cold", "Cold", "Very cold") ) )ggplot() +geom_sf(data=tes_hot_spots3, color=NA, aes(fill = classification)) +scale_fill_brewer(type ="div", palette =5) +geom_sf(data=borders, fill=NA)+coord_sf(xlim =c(min_lon, max_lon), ylim =c(-15, -5), expand =FALSE)